🎯 Bit-Vector Algorithms - abnv · Scour

mmgehlot/bitpolar: BitPolar: near-optimal vector quantization — 3-8 bit compression with zero training. 58 integrations across every major AI framework. 🎯Bit Vectors

github.com·3d·Hacker News·

Metal Quantized Attention: pulling M5 Max ahead with Int8 matrix multiplication 🗺️Region Inference

releases.drawthings.ai·1d·Hacker News·

Towards Formal Security Proofs of MQOM 📡Binary Protocols

eprint.iacr.org·2d·

TIL: Quantisation ∀Quantified Types

anup.io·5d·

Fujitsu One Compression (LLM Quantization) 📦Compression Algorithms

fujitsuresearch.github.io·1d·Hacker News·

Geekbench investigates up to 30% jump with Intel's iBOT — performance gain attributed to newly-vectorized instructions ⚡Instruction Fusion

tomshardware.com

·2d·

Iteratively optimizing an SPSC queue 🎯Ring Buffers

blog.c21-mac.com·4d·r/cpp·

Beating Python’s GIL: Achieving a 130x Speedup in Batch Processing with Rust and Rayon 🦀MIR Optimization

medium.com·2d·

MXFP8 GEMM: Up to 99% of cuBLAS Performance Using CUDA and PTX 🔬Nanopasses

danielvegamyhre.github.io·5d·Hacker News·

Building a Production-Grade Vector Database in Rust: What We Shipped 🚂Cranelift Backend

ferres.io·2d·DEV·

Accelerate CPU-based AI inference workloads using Intel AMX on Amazon EC2 🗺️Region Inference

aws.amazon.com·3d·

jhammant/Turbo1bit: Turbo1Bit: Combining 1-bit LLM weights (Bonsai) with TurboQuant KV cache compression for maximum inference efficiency. 4.2x KV cache compression + 16x weight compression = ~10x total memory reduction. 🗺️Region Inference

github.com·4h·Hacker News·

Speculative Decoding: Performance or Illusion? 🗺️Region Inference

specdecode-bench.github.io·6d·Hacker News·

Discord Engineers Add Distributed Tracing to Elixir's Actor Model Without Performance Penalty ✨Gleam

infoq.com·5d·

APL Performance 🔀SIMD Programming

aplwiki.com·3d·Hacker News·

Rethinking r-PKP: a New Formulation for the Relaxed Permuted Kernel Problem ⚡Partial Evaluation

eprint.iacr.org·2d·

Pure C implementation of the TurboQuant paper (ICLR 2026) for KV cache compression in LLM inference. 🗺️Region Inference

github.com·1d·r/LocalLLaMA·

Finding-Fortune/Binary-Cellular-Automata: The Cellular Automata algorithm for cave generation computed with binary operations for a massive performance speed-up. >10x faster than other noise libraries at cave generation. ⚡Cache-Aware Algorithms

github.com·2d·r/proceduralgeneration·

On the properties of arithmetic crosscorrelation for sequences with coprime periods 🎯Bit Vectors

eprint.iacr.org·4d·

castnettech/mnemosyne: LLM context compression and retrieval engine. Zero dependencies. Sub-100ms queries. 40-70% token reduction. 🔄Subinterpreters

github.com·5d·r/SideProject·

Loading more...